13 Numerical Integration

Our final – and best – root-finding methods, homotopy continuation and monodromy, required numerically solving a differential equation. We will start with what is arguably the simplest differential equation, numerical integration (which was sufficient for both homotopy and monodromy).

By the fundamental theorem of calculus, \[F(x) = \int_{x_0}^x f(t)dt\] is a solution to the differential equation \[F'(x) = f(x).\]

In this sense, integrals are the simplest differential equation, although you’ve doubtless seen that they’re already quite challenging.

You might wonder why numerical calculus leaves out derivatives. The problem is that derivatives are “fragile” in the sense that even slight changes to a function can cause them to change substantially. If you only have a cloud of points, it is very hard to make a sensible definition of an approximate tangent. Integration is far more stable. In fact, it has a smoothing effecting, because \[\int_{x_0}^x f(t) dt\] is always continuous when \(f\) is integrable. Even adding a great deal of fuzz to a function will not prevent it from being integrable (unless you go out of your way to make this happen).

13.1 Riemann Integral

The Riemann integral from your calculus class happens to come with a numerical method already!.

Given a function \(f\) which is bounded on the interval \([a,b]\), its Riemann integral was defined as follows:

Let \(\mathcal P\) be a partition, meaning an ordered list \(a=x_0<x_1<...<x_{n} = b\).
Compute a (left) Riemann sum \[S_\mathcal P= \sum_{i=0}^{n-1} f(x_i)(x_{i+1} - x_i)\]
Take the limit of the Riemann sums over finer and finer partitions.

A simple choice are the evenly spaced partitions. Given an integer \(n\), let \(\Delta x = \frac{b - a}n\) and then define a partition \(\mathcal P_n\) by \(x_k = a + k \Delta k\). In this case, \(x_{i+1}-x_i = \Delta x\). The corresponding partial sum is just \[S_{\mathcal P_n} \sum_{i=0}^{n-1} f(x_i)\Delta x.\]

For typical functions, the limit of these partial sums is the integral, and so they provide a way to numerically estimate the integral.

13.1.1 Error

To estimate the error, we’ll start by bounding the error on each “slice”, and then add up all of those errors across the integral (integrating these little errors, if you will).

Lemma 13.1 Let \(f\) be differentiable on \([x,x+\Delta x]\). Then

\[\int_{x}^{x+\Delta x}f(t)dt = f(x)\Delta x + \frac{f'(\zeta)}{2}\Delta x^2\]

for some \(\zeta\in [x, x+\Delta x]\).

Proof. As both the question and the shape of the answer suggest, we will need something like the mean value theorem.

First, let \[F(y) = \int_x^y f(t)dt.\]

By the FTC, \(F' = f\), so \(F\) is twice differentiable. We can therefore apply the Lagrange form of the remainder to its BLA:

\[\begin{align*} F(x+\Delta x) &= F(x) + F'(x)\Delta x + \frac{F''(\zeta)}{2}\Delta x^2,\\ &= 0 + f(x)\Delta x + \frac{f'(\zeta)}{2} f'(\zeta) \Delta x^2, \end{align*}\]

for some \(\zeta \in [x, x+\Delta x].\)

Theorem 13.1 Let \(f\) be differentiable on \([a,b]\) and let \(S_n\) be the \(n\)th evenly-spaced Riemann estimate.

If \(|f'(x) \leq M|\), then

\[\left|\int_a^b f(x) dx - S_n\right|\leq \frac{M}{2n}.\]

Proof. Let \(x_i\) as prescribed by the method, so that when \(\Delta x = (b-a)/n\) we have \(x_{i+1} = x_i + \Delta x\).

By additivity of the integral,

\[\begin{align*} \int_a^b f(x) dx &= \sum_{i=0}^{n-1} \int_{x_i}^{x_{i+1}} f(x) dx\\ &= \sum_{i=0}^{n-1} \int_{x_i}^{x_i+\Delta x} f(x) dx \end{align*}\]

Meanwhile, \[S_n = \sum_{i=0}^{n-1} f(x_i) \Delta x.\]

Let \(E = \int_a^b f(x) dx - S_n\). Subtracting the sums termwise, we can write \(E = \sum_{i=0}^{n-1} E_i\) where

\[E_i = \int_{x_i}^{x + \Delta x} f(x) dx - f(x_i)\Delta x.\]

By the triangle inequality, \[|E| \leq \sum_{i=0}^{n-1} |E_i|.\]

Apply Lemma 13.1 to each of these pieces: \[E_i = \frac{f'(\zeta_i)}{2}\Delta x^2 \] for some \(\zeta_i \in [x_i,x_i + \Delta x] \subseteq [a,b]\).

Therefore, we can estimate \[|E_i| \leq \frac{|M|}2 \Delta x^2\]

and hence \[|E| \leq \sum_{i=0}^{n-1} \frac{|M|}2 \Delta x^2 = n\frac{|M|}2 \Delta x^2.\]

Recall \(\Delta x = 1/n\) to obtain the claimed bound.

Note that a bound \(M\) for \(f'\) always exists when the derivative is continuous, which is not uncommon. This estimate tells us that the error decreases as \(n\) increases, so the method converges. On the other hand, this is pretty slow convergence, even slower than bisection method: accuracy to \(n\) decimal digits requires \(M10^n\) steps!

13.1.2 Euler’s Method

Euler’s method is essentially a rephrasing of the Riemann integral. This method uses the BLA estimate

\[F(x+\Delta x) \approx F(x) + F'(x)\Delta x,\]

to estimate \(F(x)\). Given an initial \(F(x_0) = F_0\), a step size \(\Delta x\), and the derivative \(F'(x) = f(x)\), inductively define

\[\begin{align*} x_{i+1} &= x_{i} + \Delta x,\\ F_{i+1} &= F_{i} + f(x_i) \Delta x. \end{align*}\]

So that \(F_i\) is an estimate for \(F(x_i)\). If we started from \(x_0 = a\) and \(\Delta x = (b-a)/n\) then \(F_n \approx F(b)\). Flattening the iterating into a sum we see

\[\begin{align*} F_n &= F_0 + f(x_0) \Delta x + f(x_1) \Delta x + ... f(x_{n-1}) \Delta x\\ &= F_0 + \sum_{i=0}^{n-1} f(x_i) \Delta x \end{align*}\]

which is \(F(x_0) = F(a)\) plus the \(n\)th Riemann sum for the integral of \(f(x)\) on \([a,b]\). Taking the limit and applying the fundamental theorem of calculus, we see that

\[\begin{align*} \lim_{n\to\infty} F_n &= F(a) + \lim_{n\to\infty}\sum_{i=0}^{n-1} f(x_i) \Delta x\\ &= F(a) + \int_a^b f(x) dx\\ &= F(a) + F(b) - F(a)\\ &= F(b) \end{align*}\]

In the homotopy and monodromy methods, \(x\) was an implicit function of \(t\) and so our expression for \(x'(t)\) also involved \(x\)’s. For nice enough functions this does not affect the limit calculations, but it does mean that the error will be even larger, and the error bound for Euler’s method is already pretty weak. This is why the plain Euler’s method versions of the homotopy and monodromy methods aren’t very accurate.

13.2 Polynomial Interpolation

You likely learned the trapezoid rule in calculus as well. The left-handed evenly-spaced Riemann sum uses a \(\Delta x\) by \(f(x)\) rectangle to estimate area, while the trapezoid rule uses the trapezoid with corners \((x,0)\), \((x+\Delta x, 0)\), \((x,f(x))\), \((x+\Delta x, f(x+\Delta x))\). With a little geometry, the trapezoid’s area turns out to be

\[\frac{f(x) + f(x+\Delta x)}{2}\Delta x\]

leading to a different partial sum,

\[\sum_{i=0}^{n-1} \frac{f(x_i) + f(x_{i+1})}{2} \Delta x.\]

Usually, the trapezoids match the function more closely than the standard rectangles, so one hopes that they decrease the error.

The idea behind both methods to approximate \(f(t)\) on a small interval \([x,x+\Delta x]\) by something whose area can be calculated exactly. The standard rule uses the constant \(f(t)\approx f(x)\), and the trapezoid rule uses the secant line \[f(t) \approx f(x) + \frac{f(x+\Delta x) - f(x)}{\Delta x}(t-x).\]

Since we do know how to find the area under a polynomial, this suggests expanding our family in different ways:

Higher degree polynomial interpolations
Taylor polynomials approximations, when \(f\) is sufficiently differentiable.

For example, Simpson’s rule (Exercise 13.2) uses quadratic interpolation.

The general process is as follows:

Choose a partition \(a=x_0 < x_1 < ... < x_n = b\) and some degrees \(d_0,...,d_{n-1}\).
On each sub-interval \([x_i,x_{i+1}]\) find a degree \(d_i\) polynomial approximation \(p_i\) to \(f(x)\).
Let \(P_i\) be the integral of \(p_i\) on \([x_i,x_{i+1}]\). This can be found by power rule alone (so the algorithm isn’t circular!).
Estimate the integral by \[\int_a^b f(x) dx \approx \sum_{i=0}^{n-1} P_i\]

As before, this should converge to the integral of \(f\) as the partition is made finer. We expect that using higher degrees will also improve the accuracy (at the cost of finding the interpolating polynomials). If you know more about your function, you can save computing time but still obtain good results by using higher degrees in places where the function is more complicated, and lower degrees elsewhere.

There are various ways to calculate \(p_i\). For example, we could use a taylor polynomial of degree \(d_i\) centered at some point in \([x_i,x_{i+1}]\), or we could pick \(d_i+1\) points \(x_{i,0},x_{i,1},...,x_{i,d_i}\) in that interval and use a polynomial interpolation with those nodes.

Given a such a choice of nodes, let’s take a look at the interpolating polynomial using the Lagrange basis: \[p_i(x) = \sum_{k=0}^{d_i} f(x_{i,k}) \ell_{i,k}(x).\]

Integrating this, \[\begin{align*} P_i &= \int_{x_i}^{x_{i+1}} \sum_{k=0}^{d_i} f(x_{i,k}) \ell_{i,k}(x)dx,\\ &= \sum_{k=0}^{d_i} f(x_{i,k}) \int_{x_i}^{x_{i+1}} \ell_{i,k}(x)dx. \end{align*}\]

Crucially, using the Lagrange basis reduces us to just integrating \(\ell_{i,k}\), which only depends on the nodes \(x_{i,0},x_{i,1},...,x_{i,d_i}\), rather than the function of interest.

If they can be evaluated efficiently, this expansion of the method won’t take too many more steps.
Also, if we plan to change the function \(f\) but keep the partition, we can pre-compute those values (see below).
When the nodes are evenly-spaced, this is called the Newton-Cotes formula.
Note, however, that evenly spaced nodes do not generally produce the best polynomial interpolations, and that this method isn’t a good choice for any function that’s hard to approximate by polynomials.

In the case of evenly-spaced nodes, one can use the composition rule/\(u\)-substitution to relate the integrals of our particular \(\ell_{i,k}\) to the corresponding integral associated to the Lagrange basis for degree \(d_i\) polynomials on \([0,1]\) with nodes at \(k/d_i\).

\[P_i = \sum_{k=0}^{d_i} f(x_{i,k}) L_{d_i,k}(x_{i+1}-x_i),\]

where the \(L_{d_i,k}\) can often be looked up or computed in advance. Even if you are re-calculating them, this means that as soon as you know their values for one \(P_i\), you can reuse them for all \(P_j\) with the same degree!

13.3 Quadrature

Let’s simplify our notation by dropping the dependence on \(i\) and use \([a,b]\) in place of \([x_i,x_{i+1}]\). Then let \(d\) be the degree, \(p\) the interpolating polynomial, \(a = x_0 < x_1 <... < x_d = b\) the nodes. In our discussion above, we produced an estimate of the form

\[\int_a^b f(x) dx \approx \int_a^b p(x)dx = \sum_{k=0}^d A_k f(x_k),\]

where the \(A_k\) are the integrals of the Lagrange basis elements at these nodes, and did not depend on \(f\). What if we wanted to use a different basis for our space of polynomials? It turns out that we could, and that such a basis would produce the same coefficients \(A_k\). The main reason is that the estimate \(\int_a^b f(x) dx \approx \int_a^b p(x) dx\) is an equality when \(f(x)\) is itself a polynomial of degree at most \(d\).

Therefore, if we have any basis \(q_0(x),...,q_d(x)\), they satisfy

\[\begin{align*} &[A_0, A_1, A_2, \dots, A_d] \begin{bmatrix} q_0(x_0) & q_1(x_0) & q_2(x_0) & \cdots & q_d(x_0) \\ q_0(x_1) & q_1(x_1) & q_2(x_1) & \cdots & q_d(x_1) \\ q_0(x_2) & q_1(x_2) & q_2(x_2) & \cdots & q_d(x_2) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ q_0(x_d) & q_1(x_d) & q_2(x_d) & \cdots & q_d(x_d) \end{bmatrix}\\[1em] = &\left[\int_a^b q_0(x)dx, \int_a^b q_1(x)dx, \int_a^b q_2(x)dx,\;\dots\;,\int_a^b q_d(x)dx\right] \end{align*}\]

because the entries on the left are just

\[\sum_{k=0}^d A_k q_j(x_k) \]

and \(q_j\) is a polynomial of degree at most \(d\), so this is its integral!

This fact can find the coefficients \(A_k\) with any basis:

Evaluate the integrals on the right hand side (can be hard)
Form the matrix (note - same as the interpolation matrix).
Solve the system.

Unfortunately, there is usually a tradeoff: if your basis gives rise to a system of equations that’s easy to solve then it probably has harder integrals (Newton or Lagrange bases), while one whose integrals that are easy to evaluate often correspond a harder system (Vandermonde basis).

Stepping back from the particulars of polynomial interpolation, the upshot of this analysis is that for some given \(a=x_0<...<x_n=b\), we can find coefficients \(A_k\) such that

\[\int_a^b f(x)dx \approx \sum_{k=0}^d A_k f(x_k)\]

where the \(A_k\) depend on the \(x_i\), but not on \(f\). This is called quadrature. Other kinds of interpolation can produce other notions of quadrature.

Exercises

Exercise 13.1 Let \(f(x)\) be a function and \(S_n\) its left Riemann sum and \(T_n\) its trapezoid sum for the same evenly-spaced partition, with gap size \(\Delta x\). Show that

\[T_n = S_n + \frac{f(x_n)-f(x_0)}{2}\Delta x.\]

In other words, the trapezoid estimate is only slightly different from the standard Riemann, and in a way entirely determined by the endpoints. From this point of view, it is a little surprising that it’s usually an improvement.

Exercise 13.2 Simpson’s rule upgrades the secant line from the trapezoid rule to a quadratic interpolation.

Determine the quadratic function which interpolates \(f(x)\) on \([a,b]\) using \(x=a,\frac {a+b}2,b\).
Integrate that quadratic function to find the area. It should be \[\int_a^b f(x) dx \approx \left(f(a) + 4f\left(\frac{a+b}{2}\right) + f(b) \right) \frac {b-a}{6}\]
Write down the sum for Simpson’s rule on an evenly-spaced partition of size \(n\) with gap size \(\Delta x\).

Programming Problems

Exercise 13.3 Implement the evenly-spaced trapezoid rule for estimating

\[\int_a^b f(x) dx\]

The input should comprise a function \(f\), the endpoints \(a,b\), and the number of pieces \(n\).

Exercise 13.4 Implement Simpson’s rule.